{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 正则表达\n",
    "\n",
    "## 1.单个字符\n",
    "\n",
    "| 字符  | 含义                                                   |\n",
    "| ----- | ------------------------------------------------------ |\n",
    "| `.`   | 匹配任意1个字符（除了`\\n`）                            |\n",
    "| `[]` | 匹配`[ ]`中列举的1个字符（`^`可以取反）                   |\n",
    "| `\\d`  | 匹配数字（`0~9`）                                      |\n",
    "| `\\D`  | 匹配非数字（`非数字`）                                 |\n",
    "| `\\s`  | 匹配空白（`空格、Tab键、回车`）                            |\n",
    "| `\\S`  | 匹配非空白                                             |\n",
    "| `\\w`  | 匹配单词字符，即`a-z、A-Z、0-9、_`（包括单个中文字符） |\n",
    "| `\\W`  | 匹配非单词字符                                         |\n",
    "\n",
    "\n",
    "注意：\n",
    "1. `\\s`并不匹配`\"\"`\n",
    "2. `<re.Match object; span=(0, 1), match='\\t'>`\n",
    "    - PS：`match=xxx`，就是我们`ret.group()`的结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义一个通用测试方法\n",
    "import re\n",
    "\n",
    "def my_match(re_str, input_str):\n",
    "    ret = re.match(re_str, input_str)\n",
    "    if ret:\n",
    "        print(f\"[匹配结果:{ret.group()}]\")\n",
    "    else:\n",
    "        print(f\"[{input_str}不匹配]\")\n",
    "    return ret\n",
    "\n",
    "# Python中字符串前面加上 r 表示原生字符串（不转义）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果: ]\n",
      "[匹配结果:\t]\n",
      "[匹配结果:\n",
      "]\n",
      "[不匹配]\n"
     ]
    }
   ],
   "source": [
    "# \\s 验证\n",
    "\n",
    "# 空格匹配验证\n",
    "my_match(\"\\s\",\" \")\n",
    "# Tab键匹配验证\n",
    "my_match(\"\\s\",\"\\t\")\n",
    "# 回车匹配验证\n",
    "my_match(\"\\s\",\"\\n\")\n",
    "\n",
    "# 不匹配验证：（空字符串）\n",
    "my_match(\"\\s\",\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:1]\n",
      "[匹配结果:1]\n",
      "[11不匹配]\n"
     ]
    }
   ],
   "source": [
    "# \\d 验证\n",
    "\n",
    "# 匹配单个数字\n",
    "my_match(\"\\d\",\"1\") # 一点要变成字符串\n",
    "\n",
    "# 多个数字则只能匹配一个字符\n",
    "my_match(\"\\d\",\"11\") # 注意\n",
    "\n",
    "# 解决：以^开头，以$结尾\n",
    "my_match(\"^\\d$\",\"11\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[4不匹配]\n",
      "[匹配结果:7]\n",
      "[5不匹配]\n",
      "[匹配结果:7]\n",
      "[匹配结果:b]\n",
      "[匹配结果:B]\n",
      "[_不匹配]\n"
     ]
    }
   ],
   "source": [
    "# [] 验证\n",
    "\n",
    "# 不是1、2、3则不匹配\n",
    "my_match(\"[1-3]\",\"4\")\n",
    "\n",
    "# 匹配1~3,6~9\n",
    "my_match(\"[1-36-9]\",\"7\")\n",
    "# 不匹配验证\n",
    "my_match(\"[1-36-9]\",\"5\")\n",
    "\n",
    "# 只匹配数字和字母（大小写）\n",
    "my_match(\"[\\da-zA-Z]\",\"7\")\n",
    "my_match(\"[\\da-zA-Z]\",\"b\")\n",
    "my_match(\"[\\da-zA-Z]\",\"B\")\n",
    "# 不匹配验证\n",
    "my_match(\"[\\da-zA-Z]\",\"_\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:3]\n",
      "[匹配结果:@]\n",
      "[4不匹配]\n",
      "[匹配结果:7]\n",
      "[匹配结果:#]\n",
      "[5不匹配]\n"
     ]
    }
   ],
   "source": [
    "# [] 取反扩展\n",
    "# \\d ==> [0-9]\n",
    "# \\D ==> [^0-9]\n",
    "\n",
    "# 非2、4、6\n",
    "my_match(\"[^246]\",\"3\")\n",
    "my_match(\"[^246]\",\"@\")\n",
    "# 错误验证\n",
    "my_match(\"[^246]\",\"4\")\n",
    "\n",
    "# 非 1~6\n",
    "my_match(\"[^1-6]\",\"7\")\n",
    "my_match(\"[^1-6]\",\"#\")\n",
    "# 错误验证\n",
    "my_match(\"[^1-6]\",\"5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:滚]\n",
      "[\n",
      "不匹配]\n",
      "[匹配结果:\t]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 1), match='\\t'>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# \\w在UTF8下会匹配中文的验证：\n",
    "my_match(\"\\w\",\"滚\")\n",
    "\n",
    "# .匹配任意字符，不包括\\n的验证：\n",
    "my_match(\".\",\"\\n\")\n",
    "\n",
    "# 除了\\n，可以匹配任意一个字符\n",
    "my_match(\".\",\"\\t\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 1), match='\\n'>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 扩展\n",
    "\n",
    "# 如果想让.支持\\n，再多传个flag：re.S\n",
    "re.match(\".\",\"\\n\",re.S)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.多个字符\n",
    "\n",
    "| 字符    | 含义                      |\n",
    "| ------- | ------------------------- |\n",
    "| `*`     | 1个字符出现次数：`>=0`    |\n",
    "| `+`     | 1个字符出现次数：`>=1`    |\n",
    "| `?`     | 1个字符出现次数：`1 or 0` |\n",
    "| `{m}`   | 1个字符出现`m`次          |\n",
    "| `{m,n}` | 1个字符出现从`[m,n]`次    |\n",
    "| `\\`     | 转义特殊字符              |\n",
    "| `^`     | 匹配字符串开头            |\n",
    "| `$`     | 匹配字符串结尾            |\n",
    "\n",
    "多个字符，一般都是以`^`开头，以`$`结尾，不然容易出Bug（`re.match`方法默认以`^`开头）\n",
    "\n",
    "PS：`vi`命令模式下，输入`^`和`$`，光标会跳转到头和尾\n",
    "\n",
    "**Python中`r\"\"`代表不转义字符串**:\n",
    "```py\n",
    "# r\"\"，如果包含转义字符\\就容易出错了，这时候r\"\"就上场了\n",
    "re.match(\"\\\\mmd\",\"\\mmd\")\n",
    "\n",
    "# 原因分析\n",
    "# \\是有特殊含义的，想要没有特殊含义就再加个\\，\n",
    "# 那加上的这个\\又有特殊含义，所以就蛋疼了，r\"\"这时候就上场了\n",
    "\n",
    "# 解决方法\n",
    "re.match(r\"\\\\mmd\",\"\\\\mmd\") # 下面有案例\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:]\n",
      "[匹配结果:11]\n",
      "[不匹配]\n",
      "[匹配结果:1]\n",
      "[匹配结果:11]\n",
      "[11不匹配]\n",
      "[匹配结果:]\n",
      "[匹配结果:1]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 1), match='1'>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# * 、 + 、?\n",
    "\n",
    "# *：0个或者多个\n",
    "my_match(r\"\\d*\",\"\")\n",
    "my_match(r\"\\d*\",\"11\")\n",
    "\n",
    "# +：1个或者多个\n",
    "# 不匹配验证：\n",
    "my_match(r\"\\d+\",\"\")\n",
    "# 匹配验证：\n",
    "my_match(r\"\\d+\",\"1\")\n",
    "my_match(r\"\\d+\",\"11\")\n",
    "\n",
    "\n",
    "# ?：0个或者1次\n",
    "# 不匹配验证：\n",
    "my_match(r\"^\\d?$\",\"11\")\n",
    "# 匹配验证\n",
    "my_match(r\"^\\d?$\",\"\")\n",
    "my_match(r\"^\\d?$\",\"1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:1]\n",
      "[匹配结果:]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 0), match=''>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 为什么用^和$包裹，看下面两个奇葩案例就知道了\n",
    "\n",
    "my_match(\"\\d\",\"123333\")\n",
    "\n",
    "my_match(\"\\d*\",\"a\") # ==> \"a\" ==> \"\"\"a\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:7]\n",
      "[匹配结果:17]\n",
      "[777不匹配]\n",
      "[匹配结果:1234567890]\n",
      "[123456789不匹配]\n",
      "[A123456789不匹配]\n"
     ]
    }
   ],
   "source": [
    "# {}指定位数验证\n",
    "# ? ==> {0,1}\n",
    "\n",
    "# 1位数字或者2位数字\n",
    "my_match(r\"^\\d{1,2}$\",\"7\")\n",
    "# 1位数字或者2位数字\n",
    "my_match(r\"^\\d{1,2}$\",\"17\")\n",
    "# 错误验证\n",
    "my_match(r\"^\\d{1,2}$\",\"777\")\n",
    "\n",
    "# 指定位数 eg:10位数字\n",
    "my_match(r\"^\\d{10}$\",\"1234567890\")\n",
    "# 错误验证 ~ 9位\n",
    "my_match(r\"^\\d{10}$\",\"123456789\")\n",
    "# 错误验证 ～ 非整数\n",
    "my_match(r\"^\\d{10}$\",\"A123456789\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:123]\n",
      "[匹配结果:1234]\n",
      "[12不匹配]\n"
     ]
    }
   ],
   "source": [
    "# {} 扩展\n",
    "# * ==> {0,}\n",
    "# + ==> {1,}\n",
    "\n",
    "# \\d 至少3个\n",
    "my_match(r\"\\d{3,}\",\"123\")\n",
    "my_match(r\"\\d{3,}\",\"1234\")\n",
    "# 错误验证\n",
    "my_match(r\"\\d{3,}\",\"12\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:a_bbp]\n",
      "[匹配结果:_]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 1), match='_'>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ^ $ 案例\n",
    "\n",
    "# 验证变量命名\n",
    "my_match(r\"^[a-zA-z_]\\w*$\",\"a_bbp\")\n",
    "my_match(r\"^[a-zA-z_]\\w*$\",\"_\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mmd\n"
     ]
    }
   ],
   "source": [
    "# 测试一个就知道为什么用\\w了\n",
    "def test蛋():\n",
    "    print(\"mmd\")\n",
    "\n",
    "test蛋() # Python Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:a_b]\n",
      "[匹配结果:mmd@qq.com]\n",
      "[a_b#w不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 如果没有加开头和结尾的Bug测试\n",
    "\n",
    "# 没有判断结尾的Bug案例\n",
    "my_match(r\"[a-zA-z_]\\w*\",\"a_b#w\")\n",
    "\n",
    "# 测试Bug，这个也匹配了\n",
    "my_match(r\"[a-zA-z_]+@qq.com\",\"mmd@qq.comcom\")\n",
    "\n",
    "# 改进 ~ 现在不匹配了\n",
    "my_match(r\"^[a-zA-z_]\\w*$\",\"a_b#w\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:mmd@qq#com]\n",
      "[mmd@qq#comcom不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 转义字符 \\ 引入案例\n",
    "\n",
    "# 测试Bug，这个也匹配了\n",
    "my_match(r\"[a-zA-z_]+@qq.com\",\"mmd@qq#comcom\")\n",
    "\n",
    "# 改进 ~ 现在不匹配了 (开头结尾+\\转义)\n",
    "my_match(r\"^[a-zA-z_]+@qq\\.com$\",\"mmd@qq#comcom\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bad escape \\m at position 0\n",
      "[匹配结果:\\mmd]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<re.Match object; span=(0, 4), match='\\\\mmd'>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# r\"\"，如果包含转义字符\\就容易出错了，这时候r\"\"就上场了\n",
    "try:\n",
    "    my_match(\"\\\\mmd\",\"\\mmd\")\n",
    "except Exception as ex:\n",
    "    print(ex)\n",
    "\n",
    "# 原因分析\n",
    "# \\是有特殊含义的，想要没有特殊含义就再加个\\，\n",
    "# 那加上的这个\\又有特殊含义，所以就蛋疼了，r\"\"这时候就上场了\n",
    "\n",
    "# 解决方法\n",
    "my_match(r\"\\\\mmd\", \"\\\\mmd\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.其他字符\n",
    "\n",
    "| 字符           | 含义                             |\n",
    "| -------------- | -------------------------------- |\n",
    "|  \\|            | 匹配左右任意一个表达式         |\n",
    "| `\\b`           | 匹配一个单词的边界(字母与空格间的位置) |\n",
    "| `\\B`\t         | 匹配非单词的边界                |\n",
    "| `( )`          | 将括号中字符作为一个分组        |\n",
    "| `\\num`         | 引用分组num匹配到的字符串       |\n",
    "| *`(?P<name>)`* | 分组起别名                    |\n",
    "| *`(?P=name)`*  | 引用别名为name分组匹配到的字符串 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:dotnet]\n",
      "[匹配结果:dotnet]\n",
      "[dotnetcrazy不匹配]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['dotnet', 'aspnet']"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 匹配边界\n",
    "\n",
    "# \\b 匹配以net结尾的单词\n",
    "my_match(r\"\\w+net\\b\",\"dotnet\")\n",
    "my_match(r\"\\w+net\\b\",\"dotnet crazy\")\n",
    "\n",
    "# 不匹配验证\n",
    "my_match(r\"\\w+net\\b\",\"dotnetcrazy\")\n",
    "\n",
    "# 后面会讲\n",
    "re.findall(r\"\\w+net\\b\",\"dotnet crazy aspnet\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[dot net crazy不匹配]\n",
      "[匹配结果:dot net]\n",
      "[匹配结果:dotnet]\n",
      "[匹配结果:dotnet]\n",
      "[匹配结果:dotnet]\n",
      "[dotnet#crazy不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 不匹配验证：\\b、\\B、^、$只是代表边界，并不表示空格\n",
    "my_match(r\"\\w+\\bnet\\b\",\"dot net crazy\")\n",
    "\n",
    "# 正确修改\n",
    "my_match(r\"\\w+\\s\\bnet\\b\",\"dot net crazy\")\n",
    "\n",
    "# 把上面换成\\B，则代表单词间必须是 非空格的字符\n",
    "my_match(r\"\\w+\\Bnet\\B\",\"dotnetcrazy\")\n",
    "my_match(r\"\\w+\\Bnet\\B\",\"dotnetAcrazy\")\n",
    "my_match(r\"\\w+\\Bnet\\B\",\"dotnet1crazy\")\n",
    "\n",
    "# 不匹配验证\n",
    "my_match(r\"\\w+\\Bnet\\B\",\"dotnet#crazy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:小明]\n",
      "[匹配结果:小张]\n",
      "[小潘不匹配]\n"
     ]
    }
   ],
   "source": [
    "# | 匹配左右任意一个表达式\n",
    "\n",
    "# 匹配小明或者小张\n",
    "my_match(r\"^小明|小张$\",\"小明\")\n",
    "my_match(r\"^小明|小张$\",\"小张\")\n",
    "# 不匹配验证\n",
    "my_match(r\"^小明|小张$\",\"小潘\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:mmd@163.com]\n",
      "[匹配结果:<h1>萌萌哒</h1>]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'h1'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# () 将括号中字符作为一个分组\n",
    "\n",
    "# group(1) 返回第1个括号匹配内容\n",
    "my_match(r\"^[a-zA-Z0-9_]+@(qq|163)\\.com$\",\"mmd@163.com\").group(1)\n",
    "\n",
    "# HTML的标签匹配匹配检查\n",
    "my_match(r\"^<([a-zA-Z1-9]+)>.*</\\1>$\",\"<h1>萌萌哒</h1>\").group(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:<p><font>我去</font></p>]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "('p', 'font', '我去')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# groups返回所有的匹配结果\n",
    "my_match(r\"^<([a-zA-Z1-9]+)><([a-zA-Z1-9]+)>(.*)</\\2></\\1>$\",\"<p><font>我去</font></p>\").groups()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:mmd@qq.com]\n",
      "('mmd', 'qq')\n",
      "[匹配结果:mmd@163.com]\n",
      "('mmd', '163')\n",
      "[@163.com不匹配]\n",
      "[mmd@123.com不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 匹配 qq.com 和 163.com (别忘记转义.)\n",
    "ret = my_match(r\"(^[a-zA-Z0-9_]+)@(qq|163)\\.com$\",\"mmd@qq.com\")\n",
    "print(ret.groups())\n",
    "\n",
    "ret = my_match(r\"(^[a-zA-Z0-9_]+)@(qq|163)\\.com$\",\"mmd@163.com\")\n",
    "print(ret.groups())\n",
    "\n",
    "# 不匹配验证\n",
    "my_match(r\"^(^[a-zA-Z0-9_]+)@(qq|163)\\.com$\",\"@163.com\")\n",
    "my_match(r\"^(^[a-zA-Z0-9_]+)@(qq|163)\\.com$\",\"mmd@123.com\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[匹配结果:<html><h1>萌萌哒</h1></html>]\n",
      "[<html><h1>萌萌哒</h2></html>不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 别名案例（不常用）\n",
    "my_match(r\"<(?P<mmd>\\w*)><(?P<dnt>.*)>.*</(?P=dnt)></(?P=mmd)>\",\"<html><h1>萌萌哒</h1></html>\").group(2)\n",
    "\n",
    "# 不匹配验证\n",
    "my_match(r\"<(?P<mmd>\\w*)><(?P<dnt>.*)>.*</(?P=dnt)></(?P=mmd)>\",\"<html><h1>萌萌哒</h2></html>\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 练练手"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0不匹配]\n",
      "[匹配结果:7]\n",
      "[匹配结果:77]\n",
      "[07不匹配]\n",
      "[777不匹配]\n",
      "[匹配结果:0]\n",
      "[匹配结果:1]\n",
      "[匹配结果:70]\n",
      "[匹配结果:100]\n",
      "[07不匹配]\n",
      "[170不匹配]\n",
      "[700不匹配]\n"
     ]
    }
   ],
   "source": [
    "# 1~100之间的数字：(1,100)\n",
    "my_match(r\"^[1-9]\\d?$\",\"0\")\n",
    "my_match(r\"^[1-9]\\d?$\",\"7\") # 十位只能是1~9\n",
    "my_match(r\"^[1-9]\\d?$\",\"77\")\n",
    "# 不匹配验证\n",
    "my_match(r\"^[1-9]\\d?$\",\"07\")\n",
    "my_match(r\"^[1-9]\\d?$\",\"777\")\n",
    "\n",
    "# 0~100的数字：[0,100]\n",
    "re_str=r\"^([1-9]?\\d?|100)$\" # ^([1-9]\\d?|100|0)$\n",
    "my_match(re_str,\"0\")\n",
    "my_match(re_str,\"1\")\n",
    "my_match(re_str,\"70\")\n",
    "my_match(re_str,\"100\")\n",
    "# 不匹配验证\n",
    "my_match(re_str,\"07\")\n",
    "my_match(re_str,\"170\")\n",
    "my_match(re_str,\"700\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.Python扩展\n",
    "\n",
    "上面的都是通用系列，下面的才能体现为啥爬虫是Python的优势：\n",
    "1. `re.match`：和其他语言用法一致（默认从头开始匹配）\n",
    "2. `re.search`：匹配第一个并返回（如果加了`^`和`$`就和match一样了）\n",
    "3. **`re.findall`：返回所有匹配的列表**\n",
    "4. **`re.sub`：将匹配到的数据进行替换，再返回新的字符串**\n",
    "    - 匹配之后替换成默认值\n",
    "    - 匹配之后进行函数处理\n",
    "5. `re.split`：正则切割函数（类似于字符串的split）\n",
    "6. *`re.compile`：正则字符串编译成正则表达式对象*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "# 匹配第一个就结束了\n",
    "ret = re.search(r\"\\d\",\"我的名字叫小明，今年23,88\")\n",
    "print(ret.group())\n",
    "\n",
    "# 如果加了开头结尾就和match一样了\n",
    "print(re.search(r\"^\\d$\",\"我的名字叫小明，今年23,88\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['2', '3', '8', '8']"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 返回所有匹配的列表\n",
    "re.findall(r\"\\d\",\"我的名字叫小明，今年23，88\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['我的名字叫小明', '今年23', '88']"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "re.split(r\"，|。\",\"我的名字叫小明，今年23。88\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'我上次买的时候***.***块，现在***就拿到了，差评！'"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sub案例：批量替换1\n",
    "re.sub(r\"\\d+\",\"***\",\"我上次买的时候98.5块，现在30就拿到了，差评！\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'我是小明，客服电话是：4006789688'"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sub案例：批量替换2 ~ 拿到分组内容并进行处理\n",
    "re.sub(r\"(\\d+)\",r\"400\\1\",\"我是小明，客服电话是：6789688\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'我上次买的时候196.0.10.0块，现在60.0就拿到了，差评！'"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sub案例：函数处理\n",
    "def shit_test(result):\n",
    "    # 返回类型必须是str\n",
    "    return str(float(result.group())*2)\n",
    "\n",
    "re.sub(r\"\\d+\",shit_test,\"我上次买的时候98.5块，现在30就拿到了，差评！\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ABZ\n",
      "ACZ\n"
     ]
    }
   ],
   "source": [
    "# 扩展内容\n",
    "\n",
    "pattern = re.compile(r\"A.*Z\",re.S) # 表达式复用\n",
    "\n",
    "print(re.match(pattern,\"ABZ\").group())\n",
    "print(re.match(pattern,\"ACZ\").group())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 练手小案例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Python', 'Golang', 'NetCore', 'JavaScript']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['Python', 'Golang', 'NetCore', 'JavaScript']"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 提取单词\n",
    "input_str = \"Python Golang NetCore JavaScript\"\n",
    "\n",
    "print(re.split(\" \",input_str))\n",
    "\n",
    "re.findall(\"[a-zA-Z]+\",input_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'职位描述岗位职责：1.负责公司数据管理制度、规范、流程的设计，参与数据开发平台的建设和管理2.规划数据仓库工作方向，持续提升团队工作目标和工作效率3.负责全面了解公司业务，进行深层次的数据分析，为数据开发项目提供指导性的意见，从数据角度为公司产品开发、业务运营提供决策支持建议4.掌握业界技术动向，组织研究大数据相关前沿技术，用于指导实际的数据支持项目任职要求：1.精通数据仓库实施理论，生命周期管理，具备大型互联网数据仓库架构设计、模型设计、ETL设计经验，以及海量数据处理和优化经验2.深入理解Hadoop/Hive/Spark/Storm/Kylin等大数据相关技术和原理3.有实际使用Hive/MR/Spark等大数据处理技术解决大数据相关问题的项目经验，具备丰富的性能调优经验4熟悉OLAP工具和数据分析技能，对数据敏感，能够进行数据分析，挖掘数据价值5.逻辑思维能力强，有较强的学习能力和创新思维，能够解决复杂的商业问题6.优秀的沟通能力和文字表达能力，有较强的团队管理能力'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 提取文字 \"\"\" 保留字符串原始格式\n",
    "html_str = \"\"\"\n",
    "<div>\n",
    "    <h3>职位描述</h3>\n",
    "    <div>\n",
    "    岗位职责： <br>1. 负责公司数据管理制度、规范、流程的设计，参与数据开发平台的建设和管理<br>2. 规划数据仓库工作方向，持续提升团队工作目标和工作效率<br>3. 负责全面了解公司业务，进行深层次的数据分析，为数据开发项目提供指导性的意见，从数据角度为公司产品开发、业务运营提供决策支持建议<br>4. 掌握业界技术动向，组织研究大数据相关前沿技术，用于指导实际的数据支持项目<br>任职要求：<br>1. 精通数据仓库实施理论，生命周期管理，具备大型互联网数据仓库架构设计、模型设计、ETL设计经验，以及海量数据处理和优化经验<br>2. 深入理解Hadoop/Hive/Spark/Storm/Kylin等大数据相关技术和原理<br>3. 有实际使用Hive/MR/Spark等大数据处理技术解决大数据相关问题的项目经验，具备丰富的性能调优经验<br>4 熟悉OLAP工具和数据分析技能，对数据敏感，能够进行数据分析，挖掘数据价值<br>5. 逻辑思维能力强，有较强的学习能力和创新思维，能够解决复杂的商业问题<br>6. 优秀的沟通能力和文字表达能力，有较强的团队管理能力\n",
    "    </div>\n",
    "</div>\n",
    "\"\"\"\n",
    "\n",
    "# 清除HTML标签（`/?`：`/`出现0次或者1次）\n",
    "re.sub(r\"</?\\w+>|\\n| \",\"\",html_str).strip()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.贪婪模式\n",
    "\n",
    "正则表达式默认就是贪婪模式，只要符合表达式就尽可能去匹配（eg：`.+`、`.*`）\n",
    "\n",
    "**解决方法：后面加个`?`（eg：`.+?`、`.*?`）**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[提取的号码为:] 8\n",
      "[贪婪的字符串:] 我叫小明，欢迎拨打客服：400678967\n",
      "[提取的号码为:] 4006789678\n",
      "[贪婪的字符串:] 我叫小明，欢迎拨打客服：\n"
     ]
    }
   ],
   "source": [
    "# 贪婪演示\n",
    "\n",
    "# 贪婪模式下会尽可能匹配:\n",
    "input_str = \"我叫小明，欢迎拨打客服：4006789678\"\n",
    "ret = re.match(r\"(.+)(\\d+)\", input_str)\n",
    "print(\"[提取的号码为:]\",ret.group(2))\n",
    "print(\"[贪婪的字符串:]\",ret.group(1))\n",
    "\n",
    "# 解决方法 .+? or .*?\n",
    "ret = re.match(r\"(.+?)(\\d+)\", input_str)\n",
    "print(\"[提取的号码为:]\",ret.group(2))\n",
    "print(\"[贪婪的字符串:]\",ret.group(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 练手小案例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "        岗位职责： <br>1. 负责公司数据管理制度、规范、流程的设计，参与数据开发平台的建设和管理<br>2. 规划数据仓库工作方向，持续提升团队工作目标和工作效率<br>3. 负责全面了解公司业务，进行深层次的数据分析，为数据开发项目提供指导性的意见，从数据角度为公司产品开发、业务运营提供决策支持建议<br>4. 掌握业界技术动向，组织研究大数据相关前沿技术，用于指导实际的数据支持项目<br>任职要求：<br>1. 精通数据仓库实施理论，生命周期管理，具备大型互联网数据仓库架构设计、模型设计、ETL设计经验，以及海量数据处理和优化经验<br>2. 深入理解Hadoop/Hive/Spark/Storm/Kylin等大数据相关技术和原理<br>3. 有实际使用Hive/MR/Spark等大数据处理技术解决大数据相关问题的项目经验，具备丰富的性能调优经验<br>4 熟悉OLAP工具和数据分析技能，对数据敏感，能够进行数据分析，挖掘数据价值<br>5. 逻辑思维能力强，有较强的学习能力和创新思维，能够解决复杂的商业问题<br>6. 优秀的沟通能力和文字表达能力，有较强的团队管理能力\n",
      "        \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'岗位职责：1.负责公司数据管理制度、规范、流程的设计，参与数据开发平台的建设和管理2.规划数据仓库工作方向，持续提升团队工作目标和工作效率3.负责全面了解公司业务，进行深层次的数据分析，为数据开发项目提供指导性的意见，从数据角度为公司产品开发、业务运营提供决策支持建议4.掌握业界技术动向，组织研究大数据相关前沿技术，用于指导实际的数据支持项目任职要求：1.精通数据仓库实施理论，生命周期管理，具备大型互联网数据仓库架构设计、模型设计、ETL设计经验，以及海量数据处理和优化经验2.深入理解Hadoop/Hive/Spark/Storm/Kylin等大数据相关技术和原理3.有实际使用Hive/MR/Spark等大数据处理技术解决大数据相关问题的项目经验，具备丰富的性能调优经验4熟悉OLAP工具和数据分析技能，对数据敏感，能够进行数据分析，挖掘数据价值5.逻辑思维能力强，有较强的学习能力和创新思维，能够解决复杂的商业问题6.优秀的沟通能力和文字表达能力，有较强的团队管理能力'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 加强版提取案例 ~ BOSS\n",
    "\n",
    "html_str = \"\"\"\n",
    "<div class=\"detail-content\">\n",
    "    <div class=\"job-sec\">\n",
    "        <h3>职位描述</h3>\n",
    "        <div class=\"text\">\n",
    "        岗位职责： <br>1. 负责公司数据管理制度、规范、流程的设计，参与数据开发平台的建设和管理<br>2. 规划数据仓库工作方向，持续提升团队工作目标和工作效率<br>3. 负责全面了解公司业务，进行深层次的数据分析，为数据开发项目提供指导性的意见，从数据角度为公司产品开发、业务运营提供决策支持建议<br>4. 掌握业界技术动向，组织研究大数据相关前沿技术，用于指导实际的数据支持项目<br>任职要求：<br>1. 精通数据仓库实施理论，生命周期管理，具备大型互联网数据仓库架构设计、模型设计、ETL设计经验，以及海量数据处理和优化经验<br>2. 深入理解Hadoop/Hive/Spark/Storm/Kylin等大数据相关技术和原理<br>3. 有实际使用Hive/MR/Spark等大数据处理技术解决大数据相关问题的项目经验，具备丰富的性能调优经验<br>4 熟悉OLAP工具和数据分析技能，对数据敏感，能够进行数据分析，挖掘数据价值<br>5. 逻辑思维能力强，有较强的学习能力和创新思维，能够解决复杂的商业问题<br>6. 优秀的沟通能力和文字表达能力，有较强的团队管理能力\n",
    "        </div>\n",
    "    </div>\n",
    "    <div class=\"job-sec\">pass</div>\n",
    "    <div class=\"job-sec\">xx</div>\n",
    "    <div class=\"job-sec company-info\"pass</div>\n",
    "    <div class=\"job-sec\">pass</div>\n",
    "</div>\n",
    "\"\"\"\n",
    "\n",
    "# 先找到第一个job-sec（正则思路：直接定位匹配，写几个关键词，其他都是偷懒写法.*?）\n",
    "ret = re.search(r'<div.*?job-sec\">.*?text\">(.*?)</div>', html_str, re.S)\n",
    "new_str = ret.group(1)\n",
    "print(new_str)\n",
    "\n",
    "# 再处理下多余的HTML标签\n",
    "re.sub(r\"<br>|\\s\",\"\",new_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "岗位职责：1.为政企客户和合作伙伴提供腾讯互联网+整体解决方案技术层面的售前架构咨询服务；2.为政府、企业提供腾讯大数据等项目的规划、咨询服务，协助合作伙伴及产品部门进行大数据等项目的落地；3.配合BD等团队发展生态合作伙伴，将腾讯能力与合作伙伴方案进行方案融合，为合作伙伴提供咨询、培训、方案融合服务；4.针对客户互联网+需求，深度定制互联网+解决方案并制定实施计划，把握全局项目进度，协调相关资源、协助实施团队完成方案Demo系统搭建，PoC测试及项目落地工作；5.负责互联网+案例、技术方案的更新维护，以及布道工作。\n",
      "岗位要求：1.本科以上学历，5年（硕士3年）以上大数据等售前咨询相关的工作经验；2.熟悉Hadoop、Spark等开源大数据技术体系，熟悉Oracle、PostgreSQL等数据库。要求至少有3个以上政企大数据项目规划与落地经验；3.具有宏观思维，有高层汇报能力。熟悉医疗、公安等行业大数据优先；4.具备优秀的文档能力，清晰明了地表达架构意图，能够熟练编写各类技术文档；5.良好的沟通、协调及资源整合能力；6.有针对行业ISV的渠道支持经验优先。\n"
     ]
    }
   ],
   "source": [
    "# 再来一例 ～ 拉勾\n",
    "\n",
    "html_str = \"\"\"\n",
    "<dd class=\"job_bt\">\n",
    "        <h3 class=\"description\">职位描述：</h3>\n",
    "        <div>\n",
    "        <p>岗位职责：<br>1. 为政企客户和合作伙伴提供腾讯互联网+整体解决方案技术层面的售前架构咨询服务；&nbsp;<br>2. 为政府、企业提供腾讯大数据等项目的规划、咨询服务，协助合作伙伴及产品部门进行大数据等项目的落地；&nbsp;<br>3. 配合BD等团队发展生态合作伙伴，将腾讯能力与合作伙伴方案进行方案融合，为合作伙伴提供咨询、培训、方案融合服务；&nbsp;<br>4. 针对客户互联网+需求，深度定制互联网+解决方案并制定实施计划，把握全局项目进度，协调相关资源、协助实施团队完成方案Demo系统搭建，PoC测试及项目落地工作；&nbsp;<br>5. 负责互联网+案例、技术方案的更新维护，以及布道工作。</p>\n",
    "<p><br>岗位要求：<br>1. 本科以上学历，5年（硕士3年）以上大数据等售前咨询相关的工作经验；&nbsp;<br>2. 熟悉Hadoop、Spark等开源大数据技术体系，熟悉Oracle、PostgreSQL等数据库。要求至少有3个以上政企大数据项目规划与落地经验；&nbsp;<br>3. 具有宏观思维，有高层汇报能力。熟悉医疗、公安等行业大数据优先；&nbsp;<br>4. 具备优秀的文档能力，清晰明了地表达架构意图，能够熟练编写各类技术文档；&nbsp;<br>5. 良好的沟通、协调及资源整合能力；&nbsp;<br>6. 有针对行业ISV的渠道支持经验优先。</p>\n",
    "        </div>\n",
    "    </dd>\n",
    "\"\"\"\n",
    "\n",
    "# 匹配需要的内容（正则思路：快速定位，然后.*?偷懒写法走起）\n",
    "ret = re.search('<h3.*?p>(.*?)</p>.*?p>(.*?)</p>',html_str,re.S)\n",
    "\n",
    "# 再处理下多余的HTML标签\n",
    "for item in (ret.group(1),ret.group(2)):\n",
    "    print(re.sub(r\"\\s|&nbsp;|<br>\", \"\", item))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 简单测试模块导入"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 6 µs, sys: 1 µs, total: 7 µs\n",
      "Wall time: 11.2 µs\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 14 µs, sys: 0 ns, total: 14 µs\n",
      "Wall time: 17.4 µs\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "from re import match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 186 ms, sys: 3.98 ms, total: 190 ms\n",
      "Wall time: 189 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# 小测试，重复导入100万次re模块大概需要200ms\n",
    "for i in range(1000000):\n",
    "    import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.08 s, sys: 548 µs, total: 1.08 s\n",
      "Wall time: 1.08 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# 小测试，重复导入100万次re模块的match方法，大概需要1s\n",
    "for i in range(1000000):\n",
    "    from re import match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.49 s, sys: 8.05 ms, total: 1.5 s\n",
      "Wall time: 1.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# 小测试，重复导入100万次re模块并进行简单匹配，需要1.5s左右\n",
    "for i in range(1000000):\n",
    "    import re\n",
    "    re.match(\"\",\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.8 s, sys: 216 µs, total: 2.8 s\n",
      "Wall time: 2.8 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# 小测试，重复导入100万次re模块的match方法并进行简单匹配，需要2.8s左右\n",
    "for i in range(1000000):\n",
    "    from re import match\n",
    "    match(\"\",\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
